Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 24649092 |
| Missing cells | 27888939 |
| Missing cells (%) | 5.7% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.7 GiB |
| Average record size in memory | 160.0 B |
Variable types
| Numeric | 14 |
|---|---|
| Categorical | 3 |
| DateTime | 2 |
| Boolean | 1 |
airport_fee has constant value "0.0" | Constant |
VendorID is highly correlated with airport_fee | High correlation |
trip_distance is highly correlated with fare_amount and 1 other fields | High correlation |
payment_type is highly correlated with improvement_surcharge | High correlation |
fare_amount is highly correlated with total_amount | High correlation |
extra is highly correlated with mta_tax and 1 other fields | High correlation |
mta_tax is highly correlated with extra and 1 other fields | High correlation |
tip_amount is highly correlated with payment_type | High correlation |
improvement_surcharge is highly correlated with payment_type and 1 other fields | High correlation |
total_amount is highly correlated with fare_amount and 2 other fields | High correlation |
store_and_fwd_flag is highly correlated with airport_fee | High correlation |
airport_fee is highly correlated with improvement_surcharge and 2 other fields | High correlation |
congestion_surcharge is highly correlated with improvement_surcharge | High correlation |
passenger_count has 809967 (3.3%) missing values | Missing |
RatecodeID has 809967 (3.3%) missing values | Missing |
store_and_fwd_flag has 809967 (3.3%) missing values | Missing |
congestion_surcharge has 809967 (3.3%) missing values | Missing |
airport_fee has 24649071 (> 99.9%) missing values | Missing |
trip_distance is highly skewed (γ1 = 738.5508544) | Skewed |
RatecodeID is highly skewed (γ1 = 104.9910041) | Skewed |
fare_amount is highly skewed (γ1 = 2856.264782) | Skewed |
extra is highly skewed (γ1 = 4963.651562) | Skewed |
mta_tax is highly skewed (γ1 = 4964.781006) | Skewed |
tip_amount is highly skewed (γ1 = 26.07406101) | Skewed |
tolls_amount is highly skewed (γ1 = 55.09558913) | Skewed |
total_amount is highly skewed (γ1 = 2523.557184) | Skewed |
passenger_count has 489385 (2.0%) zeros | Zeros |
trip_distance has 330110 (1.3%) zeros | Zeros |
payment_type has 809967 (3.3%) zeros | Zeros |
extra has 10273258 (41.7%) zeros | Zeros |
tip_amount has 7368621 (29.9%) zeros | Zeros |
tolls_amount has 23523820 (95.4%) zeros | Zeros |
congestion_surcharge has 2033937 (8.3%) zeros | Zeros |
Reproduction
| Analysis started | 2022-09-30 03:33:06.320332 |
|---|---|
| Analysis finished | 2022-09-30 04:36:45.245351 |
| Duration | 1 hour, 3 minutes and 38.93 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
df_index
Real number (ℝ≥0)
| Distinct | 6405008 |
|---|---|
| Distinct (%) | 26.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2047270.132 |
| Minimum | 0 |
|---|---|
| Maximum | 6405007 |
| Zeros | 12 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 102704 |
| Q1 | 558443 |
| median | 1340080 |
| Q3 | 3271050.25 |
| 95-th percentile | 5735959.45 |
| Maximum | 6405007 |
| Range | 6405007 |
| Interquartile range (IQR) | 2712607.25 |
Descriptive statistics
| Standard deviation | 1822005.672 |
|---|---|
| Coefficient of variation (CV) | 0.8899683748 |
| Kurtosis | -0.5348068312 |
| Mean | 2047270.132 |
| Median Absolute Deviation (MAD) | 1024731 |
| Skewness | 0.8479502783 |
| Sum | 5.046334982 × 1013 |
| Variance | 3.319704668 × 1012 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 12 | < 0.1% |
| 140695 | 12 | < 0.1% |
| 165849 | 12 | < 0.1% |
| 182225 | 12 | < 0.1% |
| 133065 | 12 | < 0.1% |
| 149441 | 12 | < 0.1% |
| 100281 | 12 | < 0.1% |
| 116657 | 12 | < 0.1% |
| 67497 | 12 | < 0.1% |
| 83873 | 12 | < 0.1% |
| Other values (6404998) | 24648972 |
| Value | Count | Frequency (%) |
| 0 | 12 | |
| 1 | 12 | |
| 2 | 12 | |
| 3 | 12 | |
| 4 | 12 | |
| 5 | 12 | |
| 6 | 12 | |
| 7 | 12 | |
| 8 | 12 | |
| 9 | 12 |
| Value | Count | Frequency (%) |
| 6405007 | 1 | |
| 6405006 | 1 | |
| 6405005 | 1 | |
| 6405004 | 1 | |
| 6405003 | 1 | |
| 6405002 | 1 | |
| 6405001 | 1 | |
| 6405000 | 1 | |
| 6404999 | 1 | |
| 6404998 | 1 |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.1 MiB |
| 2 | |
|---|---|
| 1 | |
| 6 | 45097 |
| 5 | 128 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 24649092 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 2 |
| 3rd row | 1 |
| 4th row | 2 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 16599044 | |
| 1 | 8004823 | |
| 6 | 45097 | 0.2% |
| 5 | 128 | < 0.1% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2 | 16599044 | |
| 1 | 8004823 | |
| 6 | 45097 | 0.2% |
| 5 | 128 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 16599044 | |
| 1 | 8004823 | |
| 6 | 45097 | 0.2% |
| 5 | 128 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 24649092 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 16599044 | |
| 1 | 8004823 | |
| 6 | 45097 | 0.2% |
| 5 | 128 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 24649092 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 16599044 | |
| 1 | 8004823 | |
| 6 | 45097 | 0.2% |
| 5 | 128 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24649092 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 16599044 | |
| 1 | 8004823 | |
| 6 | 45097 | 0.2% |
| 5 | 128 | < 0.1% |
| Distinct | 11776036 |
|---|---|
| Distinct (%) | 47.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.1 MiB |
| Minimum | 2002-12-31 23:06:55 |
|---|---|
| Maximum | 2021-06-10 10:10:48 |
| Distinct | 11776414 |
|---|---|
| Distinct (%) | 47.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.1 MiB |
| Minimum | 2002-12-31 23:08:03 |
|---|---|
| Maximum | 2021-06-10 10:41:42 |
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 809967 |
| Missing (%) | 3.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.467982655 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 489385 |
| Zeros (%) | 2.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 5 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.112779455 |
|---|---|
| Coefficient of variation (CV) | 0.7580331083 |
| Kurtosis | 6.349505884 |
| Mean | 1.467982655 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.556566397 |
| Sum | 34995422 |
| Variance | 1.238278114 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 17511386 | |
| 2 | 3349141 | 13.6% |
| 3 | 872659 | 3.5% |
| 5 | 751719 | 3.0% |
| 0 | 489385 | 2.0% |
| 6 | 474541 | 1.9% |
| 4 | 390094 | 1.6% |
| 7 | 91 | < 0.1% |
| 8 | 58 | < 0.1% |
| 9 | 51 | < 0.1% |
| (Missing) | 809967 | 3.3% |
| Value | Count | Frequency (%) |
| 0 | 489385 | 2.0% |
| 1 | 17511386 | |
| 2 | 3349141 | 13.6% |
| 3 | 872659 | 3.5% |
| 4 | 390094 | 1.6% |
| 5 | 751719 | 3.0% |
| 6 | 474541 | 1.9% |
| 7 | 91 | < 0.1% |
| 8 | 58 | < 0.1% |
| 9 | 51 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 51 | < 0.1% |
| 8 | 58 | < 0.1% |
| 7 | 91 | < 0.1% |
| 6 | 474541 | 1.9% |
| 5 | 751719 | 3.0% |
| 4 | 390094 | 1.6% |
| 3 | 872659 | 3.5% |
| 2 | 3349141 | 13.6% |
| 1 | 17511386 | |
| 0 | 489385 | 2.0% |
| Distinct | 7375 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.527101448 |
| Minimum | -30.62 |
|---|---|
| Maximum | 350914.89 |
| Zeros | 330110 |
| Zeros (%) | 1.3% |
| Negative | 2338 |
| Negative (%) | < 0.1% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -30.62 |
|---|---|
| 5-th percentile | 0.45 |
| Q1 | 0.99 |
| median | 1.65 |
| Q3 | 3 |
| 95-th percentile | 10.38 |
| Maximum | 350914.89 |
| Range | 350945.51 |
| Interquartile range (IQR) | 2.01 |
Descriptive statistics
| Standard deviation | 325.0319578 |
|---|---|
| Coefficient of variation (CV) | 92.15271025 |
| Kurtosis | 649867.1222 |
| Mean | 3.527101448 |
| Median Absolute Deviation (MAD) | 0.84 |
| Skewness | 738.5508544 |
| Sum | 86939848.09 |
| Variance | 105645.7736 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.9 | 433901 | 1.8% |
| 0.8 | 432751 | 1.8% |
| 1 | 427480 | 1.7% |
| 1.1 | 410856 | 1.7% |
| 0.7 | 409927 | 1.7% |
| 1.2 | 391452 | 1.6% |
| 1.3 | 369497 | 1.5% |
| 0.6 | 362629 | 1.5% |
| 1.4 | 346212 | 1.4% |
| 0 | 330110 | 1.3% |
| Other values (7365) | 20734277 |
| Value | Count | Frequency (%) |
| -30.62 | 2 | |
| -29.47 | 1 | < 0.1% |
| -29.23 | 1 | < 0.1% |
| -29.1 | 2 | |
| -29.09 | 1 | < 0.1% |
| -29.07 | 2 | |
| -29.06 | 3 | |
| -27.97 | 1 | < 0.1% |
| -27.32 | 2 | |
| -27.27 | 2 |
| Value | Count | Frequency (%) |
| 350914.89 | 1 | |
| 350814.14 | 1 | |
| 350793.6 | 1 | |
| 350722.34 | 1 | |
| 350696.98 | 1 | |
| 350104.58 | 1 | |
| 349987.05 | 1 | |
| 349692.3 | 1 | |
| 297004.51 | 1 | |
| 275196.59 | 1 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 809967 |
| Missing (%) | 3.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.048557361 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.761083543 |
|---|---|
| Coefficient of variation (CV) | 0.7258387297 |
| Kurtosis | 13408.51976 |
| Mean | 1.048557361 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 104.9910041 |
| Sum | 24996690 |
| Variance | 0.5792481594 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 23231656 | |
| 2 | 428170 | 1.7% |
| 5 | 119963 | 0.5% |
| 3 | 39440 | 0.2% |
| 4 | 18581 | 0.1% |
| 99 | 1165 | < 0.1% |
| 6 | 150 | < 0.1% |
| (Missing) | 809967 | 3.3% |
| Value | Count | Frequency (%) |
| 1 | 23231656 | |
| 2 | 428170 | 1.7% |
| 3 | 39440 | 0.2% |
| 4 | 18581 | 0.1% |
| 5 | 119963 | 0.5% |
| 6 | 150 | < 0.1% |
| 99 | 1165 | < 0.1% |
| Value | Count | Frequency (%) |
| 99 | 1165 | < 0.1% |
| 6 | 150 | < 0.1% |
| 5 | 119963 | 0.5% |
| 4 | 18581 | 0.1% |
| 3 | 39440 | 0.2% |
| 2 | 428170 | 1.7% |
| 1 | 23231656 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 809967 |
| Missing (%) | 3.3% |
| Memory size | 47.0 MiB |
| False | |
|---|---|
| True | 245333 |
| (Missing) | 809967 |
| Value | Count | Frequency (%) |
| False | 23593792 | |
| True | 245333 | 1.0% |
| (Missing) | 809967 | 3.3% |
PULocationID
Real number (ℝ≥0)
| Distinct | 262 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 163.9707148 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 114 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 261 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 120 |
Descriptive statistics
| Standard deviation | 66.75225692 |
|---|---|
| Coefficient of variation (CV) | 0.4070986517 |
| Kurtosis | -0.9337534246 |
| Mean | 163.9707148 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | -0.2776446138 |
| Sum | 4041729235 |
| Variance | 4455.863804 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 237 | 1145412 | 4.6% |
| 236 | 1089583 | 4.4% |
| 161 | 946854 | 3.8% |
| 186 | 862321 | 3.5% |
| 162 | 831592 | 3.4% |
| 170 | 754603 | 3.1% |
| 142 | 747187 | 3.0% |
| 48 | 730150 | 3.0% |
| 239 | 704016 | 2.9% |
| 141 | 688205 | 2.8% |
| Other values (252) | 16149169 |
| Value | Count | Frequency (%) |
| 1 | 2114 | < 0.1% |
| 2 | 25 | < 0.1% |
| 3 | 1865 | < 0.1% |
| 4 | 35561 | |
| 5 | 109 | < 0.1% |
| 6 | 142 | < 0.1% |
| 7 | 34971 | |
| 8 | 263 | < 0.1% |
| 9 | 1234 | < 0.1% |
| 10 | 8864 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 57887 | 0.2% |
| 264 | 166696 | 0.7% |
| 263 | 573815 | |
| 262 | 361985 | |
| 261 | 110680 | 0.4% |
| 260 | 12824 | 0.1% |
| 259 | 2706 | < 0.1% |
| 258 | 2469 | < 0.1% |
| 257 | 1510 | < 0.1% |
| 256 | 10804 | < 0.1% |
DOLocationID
Real number (ℝ≥0)
| Distinct | 263 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 161.1702657 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 42 |
| Q1 | 107 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 260 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 127 |
Descriptive statistics
| Standard deviation | 70.95646814 |
|---|---|
| Coefficient of variation (CV) | 0.4402578094 |
| Kurtosis | -1.014689305 |
| Mean | 161.1702657 |
| Median Absolute Deviation (MAD) | 69 |
| Skewness | -0.3129428061 |
| Sum | 3972700708 |
| Variance | 5034.82037 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 236 | 1126419 | 4.6% |
| 237 | 1015502 | 4.1% |
| 161 | 843739 | 3.4% |
| 170 | 738636 | 3.0% |
| 141 | 686809 | 2.8% |
| 142 | 670973 | 2.7% |
| 239 | 667836 | 2.7% |
| 162 | 666196 | 2.7% |
| 48 | 640791 | 2.6% |
| 238 | 608220 | 2.5% |
| Other values (253) | 16983971 |
| Value | Count | Frequency (%) |
| 1 | 34575 | 0.1% |
| 2 | 45 | < 0.1% |
| 3 | 3808 | < 0.1% |
| 4 | 109252 | |
| 5 | 312 | < 0.1% |
| 6 | 453 | < 0.1% |
| 7 | 90402 | |
| 8 | 528 | < 0.1% |
| 9 | 2903 | < 0.1% |
| 10 | 20458 | 0.1% |
| Value | Count | Frequency (%) |
| 265 | 56919 | 0.2% |
| 264 | 152443 | 0.6% |
| 263 | 541634 | |
| 262 | 380024 | |
| 261 | 93041 | 0.4% |
| 260 | 28701 | 0.1% |
| 259 | 5879 | < 0.1% |
| 258 | 7347 | < 0.1% |
| 257 | 10210 | < 0.1% |
| 256 | 50300 | 0.2% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.23833101 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 809967 |
| Zeros (%) | 3.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 2 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.5282318102 |
|---|---|
| Coefficient of variation (CV) | 0.4265675379 |
| Kurtosis | 2.323307471 |
| Mean | 1.23833101 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.9545461655 |
| Sum | 30523735 |
| Variance | 0.2790288453 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 17463775 | |
| 2 | 6148485 | 24.9% |
| 0 | 809967 | 3.3% |
| 3 | 144485 | 0.6% |
| 4 | 82365 | 0.3% |
| 5 | 15 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 809967 | 3.3% |
| 1 | 17463775 | |
| 2 | 6148485 | 24.9% |
| 3 | 144485 | 0.6% |
| 4 | 82365 | 0.3% |
| 5 | 15 | < 0.1% |
| Value | Count | Frequency (%) |
| 5 | 15 | < 0.1% |
| 4 | 82365 | 0.3% |
| 3 | 144485 | 0.6% |
| 2 | 6148485 | 24.9% |
| 1 | 17463775 | |
| 0 | 809967 | 3.3% |
| Distinct | 10302 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.66777977 |
| Minimum | -1259 |
|---|---|
| Maximum | 998310.03 |
| Zeros | 10994 |
| Zeros (%) | < 0.1% |
| Negative | 92833 |
| Negative (%) | 0.4% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -1259 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 6.5 |
| median | 9 |
| Q3 | 14 |
| 95-th percentile | 35.5 |
| Maximum | 998310.03 |
| Range | 999569.03 |
| Interquartile range (IQR) | 7.5 |
Descriptive statistics
| Standard deviation | 274.0881567 |
|---|---|
| Coefficient of variation (CV) | 21.63663733 |
| Kurtosis | 9036085.551 |
| Mean | 12.66777977 |
| Median Absolute Deviation (MAD) | 3.5 |
| Skewness | 2856.264782 |
| Sum | 312249269 |
| Variance | 75124.31762 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 1297558 | 5.3% |
| 6.5 | 1270995 | 5.2% |
| 5.5 | 1261888 | 5.1% |
| 7 | 1242564 | 5.0% |
| 5 | 1172501 | 4.8% |
| 7.5 | 1161815 | 4.7% |
| 8 | 1105710 | 4.5% |
| 8.5 | 1021014 | 4.1% |
| 4.5 | 956452 | 3.9% |
| 9 | 949970 | 3.9% |
| Other values (10292) | 13208625 |
| Value | Count | Frequency (%) |
| -1259 | 1 | |
| -1238 | 1 | |
| -750 | 1 | |
| -730 | 1 | |
| -500 | 2 | |
| -497 | 1 | |
| -490 | 1 | |
| -480 | 1 | |
| -450 | 1 | |
| -445 | 1 |
| Value | Count | Frequency (%) |
| 998310.03 | 1 | |
| 671100.14 | 1 | |
| 429496.72 | 1 | |
| 398464.88 | 1 | |
| 187438.96 | 1 | |
| 151504.45 | 1 | |
| 6964 | 1 | |
| 6052 | 1 | |
| 4265 | 1 | |
| 3014.5 | 1 |
| Distinct | 447 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.071846049 |
| Minimum | -27 |
|---|---|
| Maximum | 500000.8 |
| Zeros | 10273258 |
| Zeros (%) | 41.7% |
| Negative | 41620 |
| Negative (%) | 0.2% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -27 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.5 |
| Q3 | 2.5 |
| 95-th percentile | 3.5 |
| Maximum | 500000.8 |
| Range | 500027.8 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 100.7169073 |
|---|---|
| Coefficient of variation (CV) | 93.96583341 |
| Kurtosis | 24641587.97 |
| Mean | 1.071846049 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 4963.651562 |
| Sum | 26420031.88 |
| Variance | 10143.89542 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 10273258 | |
| 2.5 | 4187100 | |
| 0.5 | 3921268 | 15.9% |
| 1 | 3007285 | 12.2% |
| 3 | 1606342 | 6.5% |
| 3.5 | 1414608 | 5.7% |
| 2.75 | 106280 | 0.4% |
| 4.5 | 59976 | 0.2% |
| -0.5 | 25945 | 0.1% |
| 7 | 20393 | 0.1% |
| Other values (437) | 26637 | 0.1% |
| Value | Count | Frequency (%) |
| -27 | 1 | < 0.1% |
| -26.5 | 1 | < 0.1% |
| -17.69 | 1 | < 0.1% |
| -7 | 2 | < 0.1% |
| -4.5 | 846 | |
| -3.5 | 6 | < 0.1% |
| -3 | 7 | < 0.1% |
| -2.5 | 17 | < 0.1% |
| -2 | 25 | < 0.1% |
| -1.3 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 500000.8 | 1 | < 0.1% |
| 113.01 | 1 | < 0.1% |
| 90.06 | 3 | |
| 87.56 | 7 | |
| 65.53 | 1 | < 0.1% |
| 55.79 | 1 | < 0.1% |
| 52.5 | 1 | < 0.1% |
| 47.4 | 1 | < 0.1% |
| 42.5 | 1 | < 0.1% |
| 36.09 | 1 | < 0.1% |
| Distinct | 24 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.51279657 |
| Minimum | -0.5 |
|---|---|
| Maximum | 500000.5 |
| Zeros | 188223 |
| Zeros (%) | 0.8% |
| Negative | 90730 |
| Negative (%) | 0.4% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -0.5 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 0.5 |
| median | 0.5 |
| Q3 | 0.5 |
| 95-th percentile | 0.5 |
| Maximum | 500000.5 |
| Range | 500001 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 100.7093215 |
|---|---|
| Coefficient of variation (CV) | 196.3923462 |
| Kurtosis | 24649064.29 |
| Mean | 0.51279657 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4964.781006 |
| Sum | 12639969.83 |
| Variance | 10142.36743 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.5 | 24370052 | |
| 0 | 188223 | 0.8% |
| -0.5 | 90730 | 0.4% |
| 3.3 | 42 | < 0.1% |
| 0.35 | 10 | < 0.1% |
| 0.32 | 8 | < 0.1% |
| 1.1 | 6 | < 0.1% |
| 3 | 3 | < 0.1% |
| 30.8 | 2 | < 0.1% |
| 2.5 | 2 | < 0.1% |
| Other values (14) | 14 | < 0.1% |
| Value | Count | Frequency (%) |
| -0.5 | 90730 | 0.4% |
| 0 | 188223 | 0.8% |
| 0.32 | 8 | < 0.1% |
| 0.35 | 10 | < 0.1% |
| 0.5 | 24370052 | |
| 0.59 | 1 | < 0.1% |
| 0.83 | 1 | < 0.1% |
| 0.9 | 1 | < 0.1% |
| 1.1 | 6 | < 0.1% |
| 1.15 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 500000.5 | 1 | < 0.1% |
| 39.51 | 1 | < 0.1% |
| 30.8 | 2 | < 0.1% |
| 18.49 | 1 | < 0.1% |
| 6.8 | 1 | < 0.1% |
| 3.3 | 42 | |
| 3.25 | 1 | < 0.1% |
| 3 | 3 | < 0.1% |
| 2.8 | 1 | < 0.1% |
| 2.74 | 1 | < 0.1% |
| Distinct | 5196 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.082026641 |
| Minimum | -493.22 |
|---|---|
| Maximum | 1393.56 |
| Zeros | 7368621 |
| Zeros (%) | 29.9% |
| Negative | 1013 |
| Negative (%) | < 0.1% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -493.22 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.92 |
| Q3 | 2.76 |
| 95-th percentile | 5.86 |
| Maximum | 1393.56 |
| Range | 1886.78 |
| Interquartile range (IQR) | 2.76 |
Descriptive statistics
| Standard deviation | 2.610752953 |
|---|---|
| Coefficient of variation (CV) | 1.253947909 |
| Kurtosis | 7264.577101 |
| Mean | 2.082026641 |
| Median Absolute Deviation (MAD) | 1.16 |
| Skewness | 26.07406101 |
| Sum | 51320066.22 |
| Variance | 6.816030981 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 7368621 | |
| 1 | 1373765 | 5.6% |
| 2 | 934709 | 3.8% |
| 2.75 | 481415 | 2.0% |
| 2.06 | 342562 | 1.4% |
| 1.96 | 338258 | 1.4% |
| 2.16 | 329021 | 1.3% |
| 1.5 | 328898 | 1.3% |
| 1.86 | 327093 | 1.3% |
| 2.26 | 315058 | 1.3% |
| Other values (5186) | 12509692 |
| Value | Count | Frequency (%) |
| -493.22 | 1 | |
| -111 | 1 | |
| -103.06 | 1 | |
| -93 | 1 | |
| -91 | 1 | |
| -87.77 | 1 | |
| -87 | 1 | |
| -70 | 1 | |
| -68.02 | 1 | |
| -63 | 1 |
| Value | Count | Frequency (%) |
| 1393.56 | 1 | |
| 1100 | 1 | |
| 1001 | 1 | |
| 800 | 1 | |
| 593 | 1 | |
| 550 | 1 | |
| 549.02 | 1 | |
| 500 | 2 | |
| 493.22 | 1 | |
| 480 | 1 |
| Distinct | 1915 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3038689531 |
| Minimum | -40 |
|---|---|
| Maximum | 925.5 |
| Zeros | 23523820 |
| Zeros (%) | 95.4% |
| Negative | 1744 |
| Negative (%) | < 0.1% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -40 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 925.5 |
| Range | 965.5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.604901844 |
|---|---|
| Coefficient of variation (CV) | 5.281559132 |
| Kurtosis | 24424.79912 |
| Mean | 0.3038689531 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 55.09558913 |
| Sum | 7490093.78 |
| Variance | 2.575709928 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 23523820 | |
| 6.12 | 1010858 | 4.1% |
| 2.8 | 18663 | 0.1% |
| 11.75 | 15333 | 0.1% |
| 12.24 | 14554 | 0.1% |
| 13.75 | 11823 | < 0.1% |
| 2.29 | 5450 | < 0.1% |
| 18.36 | 3620 | < 0.1% |
| 8.41 | 2381 | < 0.1% |
| 18.75 | 1574 | < 0.1% |
| Other values (1905) | 41016 | 0.2% |
| Value | Count | Frequency (%) |
| -40 | 1 | |
| -38.23 | 1 | |
| -35.74 | 1 | |
| -32.74 | 1 | |
| -30 | 1 | |
| -29.62 | 1 | |
| -28.75 | 1 | |
| -27.5 | 2 | |
| -27 | 1 | |
| -25.99 | 2 |
| Value | Count | Frequency (%) |
| 925.5 | 1 | |
| 911.75 | 1 | |
| 910.5 | 1 | |
| 853.55 | 1 | |
| 831.75 | 1 | |
| 700.87 | 1 | |
| 612 | 1 | |
| 601.02 | 1 | |
| 600.04 | 1 | |
| 600 | 1 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 188.1 MiB |
| 0.3 | |
|---|---|
| -0.3 | 92463 |
| 0.0 | 15917 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.003751173 |
| Min length | 3 |
Characters and Unicode
| Total characters | 74039739 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.3 |
|---|---|
| 2nd row | 0.3 |
| 3rd row | 0.3 |
| 4th row | 0.3 |
| 5th row | 0.3 |
Common Values
| Value | Count | Frequency (%) |
| 0.3 | 24540712 | |
| -0.3 | 92463 | 0.4% |
| 0.0 | 15917 | 0.1% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.3 | 24633175 | |
| 0.0 | 15917 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 24665009 | |
| . | 24649092 | |
| 3 | 24633175 | |
| - | 92463 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 49298184 | |
| Other Punctuation | 24649092 | |
| Dash Punctuation | 92463 | 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 24665009 | |
| 3 | 24633175 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 24649092 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 92463 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 74039739 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 24665009 | |
| . | 24649092 | |
| 3 | 24633175 | |
| - | 92463 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 74039739 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 24665009 | |
| . | 24649092 | |
| 3 | 24633175 | |
| - | 92463 | 0.1% |
| Distinct | 18162 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.4217469 |
| Minimum | -1260.3 |
|---|---|
| Maximum | 1000003.8 |
| Zeros | 7457 |
| Zeros (%) | < 0.1% |
| Negative | 92683 |
| Negative (%) | 0.4% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -1260.3 |
|---|---|
| 5-th percentile | 8.15 |
| Q1 | 11.16 |
| median | 14.3 |
| Q3 | 19.8 |
| 95-th percentile | 45.88 |
| Maximum | 1000003.8 |
| Range | 1001264.1 |
| Interquartile range (IQR) | 8.64 |
Descriptive statistics
| Standard deviation | 340.2244935 |
|---|---|
| Coefficient of variation (CV) | 18.46863358 |
| Kurtosis | 6833943.296 |
| Mean | 18.4217469 |
| Median Absolute Deviation (MAD) | 3.98 |
| Skewness | 2523.557184 |
| Sum | 454079334.2 |
| Variance | 115752.706 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9.8 | 478113 | 1.9% |
| 10.3 | 475155 | 1.9% |
| 9.3 | 472581 | 1.9% |
| 10.8 | 460624 | 1.9% |
| 8.8 | 447660 | 1.8% |
| 11.3 | 439171 | 1.8% |
| 11.8 | 417338 | 1.7% |
| 12.3 | 392620 | 1.6% |
| 8.3 | 385668 | 1.6% |
| 12.8 | 369603 | 1.5% |
| Other values (18152) | 20310559 |
| Value | Count | Frequency (%) |
| -1260.3 | 1 | |
| -1242.3 | 1 | |
| -750.3 | 1 | |
| -730.3 | 1 | |
| -502.8 | 1 | |
| -502.02 | 1 | |
| -500.3 | 1 | |
| -497.3 | 1 | |
| -490.3 | 1 | |
| -480.8 | 1 |
| Value | Count | Frequency (%) |
| 1000003.8 | 1 | |
| 998325.61 | 1 | |
| 671103.17 | 1 | |
| 429562.25 | 1 | |
| 398467.7 | 1 | |
| 187443.26 | 1 | |
| 151522.07 | 1 | |
| 8361.36 | 1 | |
| 6061.42 | 1 | |
| 4268.3 | 1 |
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 809967 |
| Missing (%) | 3.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.271167075 |
| Minimum | -2.5 |
|---|---|
| Maximum | 3 |
| Zeros | 2033937 |
| Zeros (%) | 8.3% |
| Negative | 74015 |
| Negative (%) | 0.3% |
| Memory size | 188.1 MiB |
Quantile statistics
| Minimum | -2.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2.5 |
| median | 2.5 |
| Q3 | 2.5 |
| 95-th percentile | 2.5 |
| Maximum | 3 |
| Range | 5.5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7473420881 |
|---|---|
| Coefficient of variation (CV) | 0.3290564117 |
| Kurtosis | 9.442761087 |
| Mean | 2.271167075 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.176351933 |
| Sum | 54142635.8 |
| Variance | 0.5585201967 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2.5 | 21730865 | |
| 0 | 2033937 | 8.3% |
| -2.5 | 74011 | 0.3% |
| 0.75 | 160 | < 0.1% |
| 2.75 | 134 | < 0.1% |
| 0.5 | 5 | < 0.1% |
| 1 | 4 | < 0.1% |
| -0.75 | 4 | < 0.1% |
| 1.5 | 2 | < 0.1% |
| 3 | 1 | < 0.1% |
| Other values (2) | 2 | < 0.1% |
| (Missing) | 809967 | 3.3% |
| Value | Count | Frequency (%) |
| -2.5 | 74011 | 0.3% |
| -0.75 | 4 | < 0.1% |
| 0 | 2033937 | 8.3% |
| 0.5 | 5 | < 0.1% |
| 0.75 | 160 | < 0.1% |
| 0.8 | 1 | < 0.1% |
| 1 | 4 | < 0.1% |
| 1.5 | 2 | < 0.1% |
| 2 | 1 | < 0.1% |
| 2.5 | 21730865 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 2.75 | 134 | < 0.1% |
| 2.5 | 21730865 | |
| 2 | 1 | < 0.1% |
| 1.5 | 2 | < 0.1% |
| 1 | 4 | < 0.1% |
| 0.8 | 1 | < 0.1% |
| 0.75 | 160 | < 0.1% |
| 0.5 | 5 | < 0.1% |
| 0 | 2033937 | 8.3% |
| Distinct | 1 |
|---|---|
| Distinct (%) | 4.8% |
| Missing | 24649071 |
| Missing (%) | > 99.9% |
| Memory size | 188.1 MiB |
| 0.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 63 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 21 | < 0.1% |
| (Missing) | 24649071 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.0 | 21 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 42 | |
| . | 21 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 42 | |
| Other Punctuation | 21 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 42 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 21 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 63 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 42 | |
| . | 21 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 63 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 42 | |
| . | 21 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 2020-03-01 00:31:13 | 2020-03-01 01:01:42 | 1.0 | 4.70 | 1.0 | N | 88 | 255 | 1 | 22.0 | 3.0 | 0.5 | 2.00 | 0.0 | 0.3 | 27.80 | 2.5 | None |
| 1 | 1 | 2 | 2020-03-01 00:08:22 | 2020-03-01 00:08:49 | 1.0 | 0.00 | 1.0 | N | 193 | 193 | 2 | 2.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 3.80 | 0.0 | None |
| 2 | 2 | 1 | 2020-03-01 00:52:18 | 2020-03-01 00:59:16 | 1.0 | 1.10 | 1.0 | N | 246 | 90 | 1 | 6.0 | 3.0 | 0.5 | 1.95 | 0.0 | 0.3 | 11.75 | 2.5 | None |
| 3 | 3 | 2 | 2020-03-01 00:47:53 | 2020-03-01 00:50:57 | 2.0 | 0.87 | 1.0 | N | 151 | 238 | 1 | 5.0 | 0.5 | 0.5 | 1.76 | 0.0 | 0.3 | 10.56 | 2.5 | None |
| 4 | 4 | 1 | 2020-03-01 00:43:19 | 2020-03-01 00:58:27 | 0.0 | 4.40 | 1.0 | N | 79 | 261 | 1 | 16.5 | 3.0 | 0.5 | 4.05 | 0.0 | 0.3 | 24.35 | 2.5 | None |
| 5 | 5 | 1 | 2020-03-01 00:04:43 | 2020-03-01 00:23:17 | 1.0 | 3.50 | 1.0 | Y | 113 | 142 | 1 | 15.0 | 3.0 | 0.5 | 3.75 | 0.0 | 0.3 | 22.55 | 2.5 | None |
| 6 | 6 | 1 | 2020-03-01 00:43:21 | 2020-03-01 01:14:36 | 1.0 | 14.10 | 1.0 | Y | 237 | 14 | 1 | 40.5 | 3.0 | 0.5 | 8.85 | 0.0 | 0.3 | 53.15 | 2.5 | None |
| 7 | 7 | 1 | 2020-03-01 00:51:35 | 2020-03-01 01:00:17 | 1.0 | 1.00 | 1.0 | N | 234 | 114 | 1 | 7.0 | 3.0 | 0.5 | 1.30 | 0.0 | 0.3 | 12.10 | 2.5 | None |
| 8 | 8 | 1 | 2020-03-01 00:13:42 | 2020-03-01 00:23:00 | 4.0 | 1.10 | 1.0 | N | 148 | 211 | 1 | 7.5 | 3.0 | 0.5 | 2.00 | 0.0 | 0.3 | 13.30 | 2.5 | None |
| 9 | 9 | 1 | 2020-03-01 00:25:05 | 2020-03-01 00:31:06 | 2.0 | 1.30 | 1.0 | N | 211 | 249 | 1 | 6.5 | 3.0 | 0.5 | 2.00 | 0.0 | 0.3 | 12.30 | 2.5 | None |
Last rows
| df_index | VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 24649082 | 549787 | 2 | 2020-06-30 23:30:59 | 2020-07-01 00:00:40 | NaN | 8.32 | NaN | None | 165 | 232 | 0 | 45.65 | 0.0 | 0.5 | 2.75 | 0.00 | 0.3 | 49.20 | NaN | None |
| 24649083 | 549788 | 2 | 2020-06-30 23:16:23 | 2020-07-01 00:10:03 | NaN | 44.49 | NaN | None | 44 | 259 | 0 | 124.58 | 0.0 | 0.5 | 2.75 | 6.12 | 0.3 | 134.25 | NaN | None |
| 24649084 | 549789 | 2 | 2020-06-30 23:20:22 | 2020-06-30 23:31:10 | NaN | 3.30 | NaN | None | 107 | 256 | 0 | 23.56 | 0.0 | 0.5 | 2.75 | 0.00 | 0.3 | 27.11 | NaN | None |
| 24649085 | 549790 | 2 | 2020-06-30 23:54:00 | 2020-06-30 23:59:00 | NaN | 1.85 | NaN | None | 50 | 68 | 0 | 8.14 | 0.0 | 0.5 | 2.47 | 0.00 | 0.3 | 13.91 | NaN | None |
| 24649086 | 549791 | 2 | 2020-06-30 23:42:00 | 2020-06-30 23:58:00 | NaN | 3.10 | NaN | None | 36 | 72 | 0 | 13.81 | 0.0 | 0.5 | 4.94 | 0.00 | 0.3 | 19.55 | NaN | None |
| 24649087 | 549792 | 2 | 2020-06-30 23:05:00 | 2020-06-30 23:32:00 | NaN | 12.96 | NaN | None | 17 | 69 | 0 | 32.91 | 0.0 | 0.5 | 2.75 | 6.12 | 0.3 | 42.58 | NaN | None |
| 24649088 | 549793 | 2 | 2020-06-30 23:21:47 | 2020-06-30 23:25:24 | NaN | 0.36 | NaN | None | 41 | 41 | 0 | 11.45 | 0.0 | 0.5 | 2.75 | 0.00 | 0.3 | 15.00 | NaN | None |
| 24649089 | 549794 | 2 | 2020-06-30 23:34:00 | 2020-06-30 23:44:00 | NaN | 2.36 | NaN | None | 242 | 81 | 0 | 18.45 | 0.0 | 0.5 | 2.75 | 0.00 | 0.3 | 22.00 | NaN | None |
| 24649090 | 549795 | 2 | 2020-06-30 23:22:47 | 2020-06-30 23:42:01 | NaN | 5.50 | NaN | None | 14 | 118 | 0 | 15.90 | 0.0 | 0.5 | 6.23 | 12.24 | 0.3 | 35.17 | NaN | None |
| 24649091 | 549796 | 2 | 2020-06-30 23:56:18 | 2020-07-01 00:27:19 | NaN | 9.59 | NaN | None | 61 | 137 | 0 | 29.68 | 0.0 | 0.5 | 0.00 | 0.00 | 0.3 | 32.98 | NaN | None |